METHOD FOR CALCULATING HMM OUTPUT PROBABILITY AND SPEECH 
RECOGNITION APPARATUS 



BACKGROUND OF THE INVENTION 

1. Field of Invention 

[0001] The present invention relates to an HMM-output-probability 
calculating method that can, by performing a reduced number of calculations, rapidly 
calculate the probability of outputting the HMM (hidden Markov Model) for use in 
speech recognition. The invention also relates to a speech recognition apparatus. 

2. Description of Related Art 

[0002] The HMM is widely used as a phoneme model for performing speech 
recognition. While the HMM can provide high speech-recognition performance, it 
has a problem in that it requires a great number of calculations. In particular, to find 
the probability of outputting the HMM, a large number of calculations is required. 
Here, concerning input vector Y at a time, when its output probability in a transition 
from state i to state j is represented by bij(Y), and it is assumed to obey an incorrelated 
normal distribution, bij(Y) can be expressed by expression (1), which is provided at 
the end of the specification. 

[0003] At this time, input vector Y can be expressed by n-dimensional (n is 
a positive integer) components (LPC cepstrum, etc.) which are obtained by analyzing 
an input sound at each point of time (time tl, time t2,...), for example, in a length of 
20 msec. For example, when input vectors at times tl, t2, t3,... are represented by Yl, 
Y2, Y3,..., input vector Yl at time tl is represented by (lyl, ly2,..., lyn), input vector 
Y2 at time t2 is represented by (2yl, 2y2,..., 2yn), and input vector Y3 is represented 
by(3yl,3y2,..., 3yn). 

[0004] In expression ( 1 ), k represents the number of dimensions of input 
vector Y at a time and has any value of 1 to n. Also, oij(k) represents a distribution in 
k dimensions in the case of states i to j, and uij(k) represents an average in k 
dimensions in the case of states i to j. 

[0005] Although output probability can be found by expression (1), there is a 
possibility that an underflow may occur since, when calculating using expression (1) 
in an unchanged form, a value obtained by the calculation is too small. Accordingly, 
when finding the output probability, logarithms are normally used before the finding. 
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When the above expression (1) is expressed by a logarithm having x as a base, it can 
be expressed by expression (2), which is provided at the end of the specification. 

[0006] In expression (2), calculation term A can be found by calculation 
beforehand. Thus, it is expressed by A as a constant, and also log x e existing in 
5 calculation term B can be expressed by a constant, so that by representing it by Z, 
expression (2) can be expressed by expression 3, which is provided at the end of the 
specification. 

[0007] However, also in expression 3, calculation term B' requiring a large 
number of calculations, in other words, [{yk-uij(k)} 2 /2aij(k) 2 ]»Z exists. In particular, 

1 0 the above term B' needs to be calculated for each dimension of input vector Y at time 
t. For example, when it is assumed that input vector Y at time t includes ten- 
dimensional (n=l 0) components, it is required that, after performing ten subtractions, 
ten multiplications, ten divisions, and ten additions, constant Z be multiplied, so that 
only in this example, the number of calculations is extremely large. 

1 5 [0008] Thus, a large impediment exists to provide small-sized, light-weight, 

and low-priced products that can perform this large number of calculations. It is 
therefore impossible to perform speech recognition using the HMM as described 
above with such hardware. 

SUMMARY OF THE INVENTION 

20 [0009] Accordingly, an object of the present invention is to enable even a 

system that has limited hardware capability to 'use the HMM by simplifying 
calculations for finding the probability of outputting the HMM (hidden Markov 
Model) and enabling rapid finding of the output probability. 

[0010] To achieve the above object, a method for calculating an HMM 

25 output probability in accordance with the present invention uses a noncorrelative 
normal distribution as an output-probability distribution, when an input vector at a 
time is represented by Y (Y has n-dimensional components, and n is a positive 
integer), dispersion in k dimensions (k is any natural number from 1 to n) in the case 
of states i to j is represented by oij(k), and an average in the k dimensions in the case 

30 of states i to j is represented by uij(k), a formula for finding the output probability is 
given by expression (1), which is provided at the end of the specification, and when 
expression (1) is expressed, using the logarithms of both sides thereof, by expression 
(2), which is provided at the end of the specification, the number of calculations of a 
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calculation term represented by [{yk-uij(k)} 2 ]/2aij(k) 2 «log x e in expression (2) is 
reduced. The method in accordance with the invention includes the steps of creating a 
code book in which the values in dimensions of input vector Y at a time are 
represented by sets of representative values, substituting the values in the dimensions 
5 (first to n-th dimensions) of input vector Y at the time by codes existing in the code 
book, and finding, based on the codes, the calculation term represented by [{yk- 
|aij(k)} 2 ]/2aij(k) 2 «log x e by referring to tables. The tables are created for sets formed 
such that among the first to n-th dimensions, dimensions capable of being treated as 
each set are collected, and in the table for each set, output values, found based on 

10 codes selected for the dimensions existing in the set, have been found as total output 
values for combinations of all the codes, and the combinations of the codes are 
correlated with total output values obtained thereby, and based on the codes selected 
for the dimensions corresponding to one set, by referring to the tables, the total output 
values which correspond to the combinations of the code for the dimensions are 

1 5 obtained, and the output probability is calculated based on the total output values. 

[001 1] In the method for calculating the HMM output probability, when the 
numbers of codes in the dimensions in collections of the dimensions capable of being 
treated as each set are allowed to differ, dimensions having identical numbers of codes 
are collected to form each set. 

20 [0012] In addition, a speech recognition apparatus in accordance with the 

present invention includes a characteristic analyzer which performs characteristic 
analyses on an input speech signal and which outputs an input vector at each point of 
time and components composed of a plurality of dimensions, a scalar quantizer which 
replaces the components by predetermined codes by performing scalar quantization on 

25 received components for the dimensions from the characteristic analyzer, an 

arithmetic processor which finds an output probability by using output values obtained 
by referring to tables which are created beforehand and uses the output probability to 
perform the arithmetic operations required for speech recognition, and a speech- 
recognition processor which outputs the result of performing speech recognition based 

30 on the result of calculation performed by the arithmetic processor. The tables are 
created for sets formed such that among the first to n-th dimensions, dimensions 
capable of being treated as each set are collected, and in the table for each set, output 
values, found based on codes selected for the dimensions existing in the set, have been 
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found as total output values for combinations of all the codes, and the combinations of 
the codes are correlated with total output values obtained thereby. Based on the codes 
selected for the dimensions corresponding to one set, by referring to the tables, the 
arithmetic processor obtains the total output values which correspond to the 
5 combinations of the code for the dimensions and calculates the output probability 
based on the total output values. 

[0013] As described above, according to the present invention, part of 
calculation to find the probability of outputting the HMM, particularly, a part having a 
great number of calculations can be found by referring to tables, whereby the 

1 0 arithmetic operations required to find the output probability can be simplified and the 
output probability can be found at high speed and with a reduced amount of 
processing. This enables even a low-processing-performance CPU to function and 
perform the required calculations. 

[0014] Also, in the present invention, the dimensions of an input vector are 

1 5 grouped into several sets and a table is created for each set, so that the number of 
tables can be reduced and post-processing for values obtained by referring to the 
tables can be further simplified. 

[0015] In addition, when collections of dimensions are formed, dimensions 
in which the numbers of codes in the code books corresponding to the dimensions are 

20 identical are collected to form each set. This enables the number of codes to differ 
depending on each dimension such that, when- it is assumed that an input vector at a 
time has, for example, ten-dimensional components, dimensions which have small 
dimension numbers, such as the first dimension and the second dimension, need to 
have, to some extent, great numbers of codes in the code books, while dimensions 

25 which have large dimension numbers, such as the eighth dimension, the ninth 
dimension, and the tenth dimension, have smaller numbers of codes in the code 
books. 

[0016] By way of example, when the number of codes can be changed 
depending on each dimension, such as sixteen codes for the code book corresponding 
30 to the first dimension and the second dimension, eight codes for each of the third 

dimension to the sixth dimension, four codes for each of the seventh dimension to the 
tenth dimension, the first dimension and the second dimension are treated as a set and 
a table corresponding thereto can be created, the third dimension to the sixth 



dimension are treated as a set and a table corresponding thereto can be created, and the 
seventh dimension to the tenth dimension are treated as a set and a table 
corresponding thereto can be created. 

BRIEF DESCRIPTION OF THE DRAWINGS 
5 [0017] Fig. 1 is a schematic illustrating an embodiment of the present 

invention and in particular processing for coding components in dimensions of input 
vector Yl at time tl; 

Figs. 2(a) and (b) are schematics illustrating processing that, by using codes by 
coding using the codebooks shown in Fig. 1 to refer to tables, finds calculation term D 
10 in expression (4) in accordance with the embodiment of the invention; 

Fig. 3 is a flowchart illustrating an outline process for finding the output 
probability of HMM corresponding to an embodiment of the present invention; 

Fig. 4 is a schematic illustrating processing (processing for a first dimension 
and a second dimension) that, by referring to tables, finds calculation term D in 
15 expression (4) in accordance with an embodiment of the present invention; 

Fig. 5 is a schematic illustrating processing (processing for a third dimension 
to a sixth dimension) that, by referring to tables, finds calculation term D in 
expression (4) in accordance with an embodiment of the present invention; 
Fig. 6 is a schematic illustrating processing (processing for a seventh 
20 dimension to a tenth dimension) that, by referring to tables, finds calculation term D 
in expression (4) in accordance with an embodiment of the present invention; and 

Fig. 7 is a schematic briefly showing an embodiment of a speech recognition 
apparatus in accordance with the present invention. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 
25 [0018] An embodiment of the present invention is described below. Details 

described in this embodiment include a description of an HMM-output-probability 
calculating method and a speech recognition apparatus of the present invention. 

[0019] As described above, the present invention is intended to simplify 
calculation for finding the probability of outputting an HMM. Specifically, the 
30 present invention is intended to simplify the calculation for finding the probability of 
outputting the HMM by transforming calculation term B' into a table and referring to 
the table so that calculation term B' can be found. 



[0020] Details are described below. In this embodiment, it is assumed that 
each of input vectors (these are represented by Yl, Y2, Y3,...) at times tl, t2, t3,... has 
ten-dimensional components. For example, input vector Yl has components lyl, 
ly2,..., lyn, input vector Y2 at time t2 has components 2yl, 2y2,..., 2yn, and input 
5 vector Y3 at time t3 has components 3yl, 3y2,..., 3yn. 

[0021] Each input vector has each code book that represents, as sets of 
representative values, values in each dimension of first to tenth dimensions. This is 
shown in Fig. 1. Fig. 1 shows code books corresponding to the components lyl, 
ly2,..., lylO of input vector Yl at time tl. Input vector Yl has code books 1C, 2C,..., 

10 10C corresponding to the dimensions, such as code book 1C which, for first 

dimensional component lyl, has codes 1C1, 1C2,..., lCm, code book 2C which, for 
the second dimensional component, has 2C1, 2C2,..., 2Cm, code book 3C which, for 
the third dimension, 3C1, 3C2,..., 3Cm, and code book 10C which, for the tenth 
dimension, 10C1, 10C2,..., lOCm. 

1 5 [0022] Here, assuming that the size of each code book is 64, m in each code 

book is 64, and each code book has 64 codes. However, actually, all the dimensions 
do not always need to have an equal number of codes, and the number of codes can be 
changed for each dimension. 

[0023] As described above, by having code books corresponding to 

20 dimensions, the ten-dimensional components of each of input vectors Yl, Y2,..., Yn at 
times tl, t2,..., tn can be replaced by several codes existing in the code books 
corresponding to the dimensions. 

[0024] By way of example, concerning a first dimensional value in input 
vector Yl at time tl, by selecting the closest value from code book 1C, it can be 

25 replaced by the selected value. Similarly, concerning a second dimensional value in 
input vector Yl, by selecting the closest value from code book 2C, it can be replaced 
by the selected value. 

[0025] Similarly, a component in each dimension of an input vector at each 
point of time can be replaced by any of m codes in a code book corresponding thereto 

30 as in a case in which, concerning also the value of first dimensional code 2yl in input 
vector Y2 at time t2, by selecting the closest value from code book 1C, it can be 
replaced by the selected value. 
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[0026] After replacing the code by any of codes in the code book, expression 
(3), which is provided at the end of the specification, can be expressed as expression 
4, which is also provided at the end of the specification: 

[0027] In expression (4), kCc represents a replaced code, k represents a 
5 dimension and has any of values from 1 to n if k has dimensions from 1 to n, and c 
represents a code number and has any of values from 1 to m if the code book size is 
m. 

[0028] Calculation term D in expression (4) can be found as a value in each 
dimension beforehand, whereby an output table is created. Specifically, calculation 

10 term D in expression (4) includes "crij(k)", "uijOO", "kCc", and "Z". "o-ij(k)" 

represents a distribution in k dimensions in the case of states i to j, as described above, 
and "uij(k)" represents an average in the case of states i to j. These can be found 
beforehand by calculation, and Z is formed as a constant. Since kCc is a code which 
exists in code books corresponding to dimensions, 1C, 2C,..., nC, it can be used. 

15 [0029] By using "crij(k)" and "uij(k)" in each dimension and "Z" which is 

formed as a constant, and using m codes existing in the dimension, calculation term D 
can be calculated. For example, by using codes 1C1 to lCm (m=64) in code book 1C 
so that calculation term D in expression (4) can be calculated beforehand, the results 
(these are here called output values) of calculating calculation term D which 

20 respectively correspond to codes 1C1 to lCm can be obtained and are formed as a 
table (this is called first dimensional table Tl)'. 

[0030] In addition, by using codes 2C1 to 2Cm (m=64) in code book 2C so 
that calculation term D in expression (4) can be calculated beforehand and are formed 
as a table (this is called second dimensional table T2). 

25 [0031] In this manner, n output tables (Tl to Tn) corresponding to 

dimensions 1 to n are created, and each of these tables Tl to Tn has m output values 
corresponding to m codes in each of dimensions 1 to n. 

[0032] Accordingly, by obtaining input vector Y at a time and obtaining 
components in each dimension of input vector Y, a scalar-quantized code for each 

30 dimension can be obtained, and by referring, based on the code, to a table 

corresponding to the dimension, calculation term D in expression (4) can be found. 

[0033] By way of example, when code 1 C3 is selected correspondingly to 
first dimensional component lyl of input vector Yl at time tl, the code 1C3 is used to 
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refer to first dimensional table Tl, whereby a first dimensional output value in 
calculation term D is found, and similarly, when code 2C16 is selected 
correspondingly to second dimensional component ly2 of input vector Yl at time tl, 
the code 2C16 is used to refer to second dimensional table T2, whereby a second 
5 dimensional output value in calculation term D is found. 

[0034] This is described using Figs. 2(a) and 2(b). As shown in Fig. 2(a), a 
first dimensional component of input vector Yl at time tl is represented by lyl, and it 
is assumed that, based on lyl, code 1C3 is selected from code book 1C. Next, based 
on code 1C3, by referring to first dimensional output table Tl, output value 103 is 
10 obtained. This output value 103 is found beforehand from calculation term D in 
expression (4). 

[0035] Similarly, as shown in Fig. 2(b), a second dimensional component of 
the same input vector Yl is represented by ly2, and it is assumed that, based on ly2, 
code 1C16 is selected from code book 2C. Next, based on this code 1C16, by 

15 referring to second dimensional table T2, output value 2016 is obtained. This output 
value 2016 is found beforehand from calculation term D in expression (4). 

[0036] By repeatedly performing the above manner, an output value in each 
dimension is obtained. Output values for dimensions (output values for the 
dimensions are here represented by Ol, 02, 03,..., On for brevity of description ) are 

20 added together ( Ol + 02 +,..., + On). The 0 1 + 02 +,..., + On is the result of 

calculating calculation term E in expression (4). Thus, by using the sum to calculate 
the entirety of expression (4), output probability bij(Y) on input vector Y at a time is 
found. 

[0037] By performing the above processing, the calculation required for 
25 obtaining term E in expression (4) needs processing that refers to a code book once for 
one dimension, and, after performing processing that, based on the result, refers to any 
table among tables Tl to Tn, adding the obtained output values together. For 
example, in the case of ten dimensions, processing is only required which refers to a 
code book ten times for each dimension, refers to a table ten times, and adds the ten 
30 obtained output values together. 

[0038] The foregoing is a description of basic processing for describing the 
present invention, in which, by obtaining calculation term D in expression (4) by 
referring to tables Tl to Tn created for dimensions, and adding output values for the 
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dimensions together, calculation term E in expression (4) is obtained. Although such 
a calculation technique may be found in a related art, the present invention is one 
obtained by further developing this thought progression and enables the efficient 
finding of an output probability by using a reduced number of calculations. The 
5 following is a description of the present invention. 

[0039] The present invention does not have the above-described output table 
for each dimension and instead groups dimensions, which can be treated as the same 
set, among dimensions 1 to n, and has tables for the sets. 

[0040] Fig. 3 is a flowchart showing the outline of an output probability 

1 0 calculating process of the present invention. Among dimensions 1 to n of an input 

vector, dimensions which can be treated as each set are grouped into several sets (step 
si), and a table is created for each set (step s2). When calculating an output 
probability, codes which correspond to components for first dimension to n-th 
dimension of the input vector are sequentially obtained from corresponding code 

1 5 books (step s3), and based on the codes, by referring to the corresponding tables, 

output values for the tables are obtained (step s4). By substituting the output value for 
each table for a calculating expression (expression (4)), the output probability is found 
(step s5). Details are described below. 

[0041] When a code book in which the components for each dimension of 

20 the input vector at each point of time are scalar-quantized is created, the number of 
codes can be changed for each dimension, as described above. For example, when it 
is assumed that an input vector at each point of time has ten-dimensional components, 
each of dimensions which have small dimension numbers, such as the first dimension 
and the second dimension, may have a larger number of codes in the code book, while 

25 each of dimensions which have large dimension numbers, such as the eighth 

dimension, the ninth dimension, and the tenth dimension, may have a smaller number 
of code in the code book. 

[0042] In a case in which the input vector is a ten-dimensional LPC 
cepstrum coefficient, one example is that the number of codes can be changed, such as 

3 0 sixteen codes for each code book corresponding to both the first dimension sixteen 
codes for also the second dimension, eight codes for the code book corresponding to 
the third dimension, eight codes also for the fourth dimension, eight codes also for the 
fifth dimension and the sixth dimension, four codes for the seventh dimension, four 



codes also for the eighth dimension, and four codes for the ninth dimension and the 
tenth dimension. 

[0043] Accordingly, when each set is generated, it is convenient to group 
dimensions by the same number of codes because in this case, the number of codes for 
5 both the first dimension and the second dimension is sixteen, the number of codes for 
the third dimension to the sixth dimension is eight, and the number of codes for the 
seventh dimension to the tenth dimension. For convenience of description, a set for 
the first dimension and the second dimension is called first set VI, a set for the third 
dimension to the sixth dimension is called second set V2, and a set for the seventh 

10 dimension to the tenth dimension is called third set V3. 

[0044] For these first to third sets V 1 , V2, and V3 , tables (first table T 1 0, 
second table T20, third table T30) are created. 

[0045] First table T 1 0 stores, for each combination of all codes existing in 
code books 1C and 2C, the sum (10m + 20m) of an output value (this is represented 

15 by 1 Om) which corresponds to a code (this is represented by 1 Cm) selected from code 
book 1C corresponding to first dimensional component yl of input vector Y at a time 
and an output value (this is represented by 20m) which corresponds to a code (this is 
represented by 2Cm) selected from code book 2C corresponding to second 
dimensional component y2 of input vector Y at the same time. 

20 [0046] In other words, for all the codes existing in code book 1 C, calculation 

term D in expression (4) is found, and for all the codes existing in code book 2C, 
calculation term D in expression (4) is found. Each output value 10m + 20m is 
calculated beforehand which corresponds to each combination of all the codes existing 
in code books CI and C2. 

25 [0047] This is specifically described using Fig. 4. 

[0048] By way of example, it is assumed that code 1C3 is selected from 
code book 1C correspondingly to first dimensional component lyl of input vector Yl 
at time tl and it is assumed that code 2C16 is selected from code book 2C 
correspondingly to second dimensional component ly2 of input vector Yl at the same 

30 time. 

[0049] At this time, it is assumed that calculation term D in expression (4) 
which corresponds to code 1C3 has already been found and the output value thereof is 
103 and it is assumed that calculation term D in expression (4) which corresponds to 
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code 2C16 has already been found and the output value thereof is 2016. Also, the 
sum "103 + 2016" of these output values has already been calculated (represented by 
103 + 2016 = 01), and the sum is stored in first table T10. Accordingly, by using 
code 1C3 selected for first dimensional component lyl and code 2C16 selected for 
5 second dimensional component ly2 to refer to first table T10, "103 + 2016", in other 
words, "Ol" is read in this case. 

[0050] This applies to output tables for other sets. In each dimension 
belonging to each set, for all codes existing in the dimension, calculation term D in 
expression (4) has already been found and a total output value for each combination of 

1 0 all the codes in the dimension existing in each set has been calculated. 

[0051] By way of example, as shown in Fig. 5, it is assumed that for third 
dimensional component ly3 of input vector Yl at time tl, code 3C5 is selected from 
code book 3C and it is assumed that for fourth dimensional component ly4 of input 
vector Yl at the same time, code 4C6 is selected from code book 4C. In addition, it is 

1 5 assumed that for fifth dimensional component ly5 of input vector Yl at the same 
time, code 5C7 is selected from code book 5C and it is assumed that for sixth 
dimensional component ly6 of input vector Yl at the same time, code 6C8 is selected 
from code book 6C. 

[0052] At this time, it is assumed that for code 3C5, 305 has been found as 

20 the output value by using calculation term D in expression (4), it is assumed that for 
code 4C6, 406 has been found as the output value by using calculation term D in 
expression (4), it is assumed that for code 5C7, 507 has been found as the output 
value by using calculation term D in expression (4), and it is assumed that for code 
6C8, 608 has been found as the output value by using calculation term D in 

25 expression (4). 

[0053] In addition, 305 + 406 + 507 + 608 has already been calculated 
(represented by 305 + 406 + 507 + 608 = 02), and the sum is stored in second table 
T20. Accordingly, by referring to table T20 based on code 3C5 for the third 
dimensional component, code 4C6 for the fourth dimensional component, code 5C7 

30 for the fifth dimensional component, and code 6C8 for the sixth dimensional 

component, "305 + 406 + 507 + 608", in other words, "02" is read as an output 
value in this case. 
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[0054] As shown in Fig. 6, it is assumed that for seventh dimensional 
component ly7 of input vector Yl at time tl, code 7C1 is selected from code book 7C 
and it is assumed that for eighth component ly8 of input vector Yl at the same time, 
code 8C2 is selected. Also, it is assumed that for ninth component ly9 of input vector 
5 Yl at the same time, code 9C3 is selected from code book 9C and it is assumed that 
for tenth dimensional component lylO of input vector Yl at the same time, code 
10C4 is selected from code book 10C. 

[0055] At this time, it is assumed that for code 7C1 , 70 1 has been found as 
the output value thereof by using calculation term D in expression (4), it is assumed 

1 0 that for code 8C2, 802 has been found as the output value thereof by using calculation 
term D in expression (4), it is assumed that for code 9C3, 903 has been found as the 
output value thereof by using calculation term D in expression (4), and it is assumed 
that for 10C4, 10O4 has been found as the output value thereof by using calculation 
term D in expression (4). 

1 5 [0056] In addition, 70 1 + 8 02 + 903 + 1 0O4 has already been calculated 

(represented by 701 + 802 + 903 + 10O4 = 03), and the sum is stored in third table 
T30. Accordingly, by referring to third table T30 based on code 7C1 for the seventh 
dimensional component, code 8C2 for the eighth dimensional component, code 9C3 
for the ninth dimensional component, and code 10C4 for the tenth dimensional 

20 component, " 701 + 802 + 903 + 10O4", in other words, "03" is read in this case. 

[0057] As described above, the first to tenth dimensions are divided into 
three sets (first set VI, second set V2, third set V3), the first set VI, the second set 
V2, and the third set V3 have tables (first table T10, second table T20, third table 
T30), respectively. Thus, a code corresponding to each dimension, which is selected 

25 from a code book for a dimension belonging to the first, second, third sets VI, V2, 
and V3, is input to the corresponding table, whereby a total output value is obtained 
which corresponds to an input combination of codes. 

[0058] Thereby, if an output value from table Tl 0 corresponding to first set 
VI is 01, an output value from table T20 corresponding to second set V2 is 02, and 

30 an output value from table T30 corresponding to third set V3, the totaling of the three 
output values Ol, 02, and 03 completes the calculation of calculation term E in 
expression (4). Specifically, the processing required for calculation term E in 
expression (4) needs, in the case of ten dimensions, ten-times code-book reference, 
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three-times table reference, and three-times table reference, so that the number of 
calculations can be greatly reduced. 

[0059] After finding calculation term E in expression (4), by using it to 
calculate the entirety of expression (4), the probability of outputting state i to state j of 
5 an input vector at time tl can be found. 

[0060] As described above, the processing required for finding calculation 
term E in expression (4) in the case of finding the output probability can be simplified 
by simply referring to a code for each dimension, using the obtained code to refer to 
some (three in the above case) tables which are treated as a set for dimensions, and 
10 adding the obtained values together, so that the number of calculations is greatly 
reduced, which can be sufficiently performed by a CPU having low processing 
performance. 

[0061] The foregoing description concerns one normal distribution. 
However, the present invention can be applied to a mixed continuous distribution 
1 5 HMM using a plurality of normal distributions. This is described below. 

[0062] In the present invention, by preparing m coded normal distributions 
(m typical normal distributions are coded and processed in the form of a code book, 
and dispersion and average of normal distributions corresponding to codes are found 
beforehand) which are common to all phonemes, the probability of outputting a 
20 phoneme in dimension yk of input vector Yt at time t in state ij can be found. 

[0063] The m (e.g., 256) coded normal distributions can be used in common 
for all phonemes. The probability of outputting a phoneme for each state ij can be 
found by selecting a plurality of normal distributions from the code book, as required. 
When the output probability in the case of using the m-th normal distribution is 
25 represented by bm(Y), bm(Y) can be expressed by expression 5, which is provided at 
the end of the specification. 

[0064] In expression (5), Wijm is a weight for the m-th normal distribution 
in state ij on each phoneme. 

[0065] Expression (5) can be expressed by expression 6, which is also 
30 provided at the end of the specification, using logarithms, as described above. In 

expression (6), calculation term A can be found beforehand by calculation. Thus, it is 
treated as a constant and is represented by A, and logxe in calculation term B can be 
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represented by a constant. Thus, when it is represented by Z, expression (6) can be 
expressed by expression 7, which is provided at the end of the specification. 

[0066] In addition, when yk in calculation term B' in expression (7) is coded 
using scalar quantization, as described above, and is represented by kCc, expression 
5 (7) can be expressed by expression 8, which is provided at the end of the specification. 

[0067] In expression (7) and expression (8), logxWijm can be treated as a 
constant since it can be found beforehand by calculation. 

[0068] Calculation term D in expression (8) is processed in the form of a 
table so that it can be found by referring to the table. This has been fully described, so 

10 that the description is omitted here. 

[0069] As described above, in the present invention, by preparing m coded 
normal distributions which are common to all phonemes, the probability of outputting 
a phoneme in a dimension of an input vector at a time in state ij can be easily found. 
[0070] Fig. 7 shows a schematic structure of a speech recognition apparatus 

1 5 to which the above-described HMM output probability calculating method is applied, 
and the structure includes a microphone 1 1 as a speech input unit, a speech-signal 
input processor 12 that amplifies and A/D conversion of an input speech signal, a 
characteristic analyzer 13 that performs processing that, after performing 
characteristic analyses on the speech signal processed by the speech-signal input 

20 processor 12, outputs an input vector for each time t and its components (e.g., ten- 
dimensional LPC cepstrum), a scalar quantizer 15 which receives components for 
each dimension from the characteristic analyzer 13 and which performs scalar 
quantization referring to code books for dimensions which are stored in a code-book 
storage unit 14 (e.g., code books 1C, 2C,..., 10C shown in Fig. 1), an arithmetic data 

25 storage unit 16 that stores various types of data, such as various parameters (aij, uij, 
etc., in dimension k in the case of state ij) which are necessary for HMM calculation, 
and the above tables (first, second, and third tables T10, T20, and T30), an arithmetic 
processor 17 that, based on the various types of data stored in the arithmetic data 
storage unit 16 and the quantized result (code selected from a code book 

30 corresponding to a component in each dimension) by the scalar quantizer 15, performs 
HMM calculation, and a speech-recognition processor 19 that outputs the result of 
performing speech recognition using the result calculated by the arithmetic processor 
17 and a word-language table 18. 
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[0071] In the process for finding the output probability, which is one of 
processes performed by the arithmetic processor 17, as described above, by referring 
to tables (first, second, and third tables T10, T20, and T30 in the above example) 
based on a code selected by the scalar quantizer 1 5 so that an output value for each 
5 table is found, using the value to calculate the output probability, and using the output 
probability, a final output likelihood is obtained. This is performed for a phoneme at 
each point of time. The speech-recognition processor receives an output from the 
arithmetic processor 17 and performs speech-recognition processing on the input 
speech. 

1 0 [0072] As described above, a speech recognition apparatus of the present 

invention finds the output probability by using the HMM output probability 
calculating method described using Fig. 1 to Fig. 6. This can greatly simplify 
calculation for finding the output probability, which allows the output probability to 
be found at high speed, and which can be sufficiently performed by a CPU having low 

1 5 processing performance. Thus, inexpensive hardware can be used to perform HMM- 
used speech recognition. 

[0073] The present invention is not limited to the above-described 
embodiment, and instead can be variously modified without departing from the spirit 
of the present invention. For example, although the embodiment has described speech 

20 recognition, the present invention can be applied to pattern recognition in the case of 
performing image recognition. A processing program is created which describes a 
process for implementing the above-described HMM-output-probability calculating 
method of the present invention, and the processing program may be recorded on a 
recording medium such as floppy dick, optical disk, or hard disk. The present 

25 invention also includes the recording medium. Alternatively, the processing program 
may be acquired from a network. 

[0074] As described above, according to the present invention, part of 
calculation for finding the probability of outputting the HMM, particularly, a part 
having a great number of calculations can be found by referring to tables, whereby the 

30 arithmetic operations required for finding the output probability can be simplified and 
the output probability can be found at high speed and with a reduced amount of 
processing. This enables even a low-processing-performance CPU to function. Also, 
in the present invention, the dimensions of an input vector are grouped into several 
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sets and a table is created for each set, so that the number of tables can be reduced and 
post-processing for values obtained by referring to the tables can be further simplified. 

[0075] Accordingly, the present invention greatly simplifies calculation for 
finding the probability of outputting the HMM so that the output probability can be 
found at high speed, and enables even a low-processing-performance CPU to function 
sufficiently. 



